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Abstract 

We consider the problem of computing Li-distances between every pair of prob- 
ability densities from a given family. We point out that the technique of Cauchy 
random projections |Ind06j in this context turns into stochastic integrals with respect 
to Cauchy motion. 

For piecewise-linear densities these integrals can be sampled from if one can sample 
from the stochastic integral of the function x i— > (l,x). We give an explicit density 
function for this stochastic integral and present an efficient (exact) sampling algorithm. 
As a consequence we obtain an efficient algorithm to approximate the Li-distances 
with a small relative error. 

For piecewise-polynomial densities we show how to approximately sample from the 
distributions resulting from the stochastic integrals. This also results in an efficient 
algorithm to approximate the Li-distances, although our inability to get exact samples 
worsens the dependence on the parameters. 

1 Introduction 

Consider a finite class T = {fi, f2, ■ ■ ■ , fm} of probability densities. We want to compute 
the distance between every pair of members of T. We are interested in the case where each 
member of J 7 is a mixture of finitely many probability density functions, each having a 
particular functional form (e.g., uniform, linear, exponential, normal, etc.). Such classes 
of distributions are frequently encountered in machine learning (e.g., mixture models, 
sec [Bis06]) and nonparametric density estimation (e.g., histograms, kernels, see |DL01| ). 
The number of distributions in a mixture gives a natural measure of complexity which we 
use to express the running time of our algorithms. 

For some classes of distributions exact algorithms are possible, for example, if each 
distribution in T is a piecewise linear function consisting of n pieces then we can compute 
the distances between all pairs in time 0(m 2 n). For other classes of distributions (for 
example, mixtures of normal distributions) exact computation of the distances might not 
be possible. Thus we turn to randomized approximation algorithms. A (5, e) -relative- error 
approximation scheme computes Dj k , j,k £ [m] such that with probability at least 1 — 6 
we have 

(Vj, k E [m]) (1 - e)D jk < ||/, - AHi < (1 + e)D jk . 



A (5, e)- absolute- error approximation scheme computes Dj k , j,k £ [m] such that with 
probability at least 1 — 5 we have 

(Vj, k G [m]) D jk - e < \\fj -fh\\i< D jk + e. 

A direct application of the Monte Carlo method ( [MU49| . see [Met 8 7] ) immediately 
yields the following absolute-error approximation scheme. Let Xj k be sampled according 
to fj and let Yj k = sgn(fj(Xj k ) — f k (Xj k )), where sgn : M — > {—1, 0, 1} is the sign function. 
The expected value of Yj k + Y k j is equal to — /fc||i, indeed 

E[Y jk + Y kj ] = J (fj(x) - f k (x)) sgn(/,(x) - f k (x)) dx = \\fj - / fe ||i. 

Thus, to obtain a (<5, e)-absolute-error approximation scheme it is enough to approxi- 
mate each Yj k with absolute error e/2 and confidence 1 — 5/m 2 . By the Chernoff bound 
0(e~ 2 ln(m 2 /5)) samples from each Yj k are enough. (The total number of samples from 
the fj is 0(me~ 2 ln(m 2 /<5), since we can use the same sample from fj for Yj\, . . . ,Yj m . 
The total number of evaluations is 0(m 2 e~ 2 ln(m 2 /S).) The running time of this algo- 
rithm will compare favorably with the exact algorithm if sampling from the densities and 
evaluation of the densities at a point can be done fast. (For example, for piecewise linear 
densities both sampling and evaluation can be done in 0(log n) time, using binary search.) 
Note that the evaluation oracle is essential (cf. [BFR + 00] who only allow use of sampling 
oracles) . 

In the rest of the paper we will focus on the harder relative-error approximation schemes 
(since the Li-distance between two distributions is at most 2, a relative-approximation 
scheme immediately yields an absolute-error approximation scheme). Our motiv ation 
comes from an application (density estimation) which requires a relative-error scheme [MS07 . 

Now we outline the rest of the paper. In Section [2] we review Cauchy random pro- 
jections; in Section [3] we point out that for density functions Cauchy random projections 
become stochastic integrals; in Section |4] we show that for piecewise linear functions we 
can sample from these integrals (using rejection sampling, with bivariate student distri- 
bution as the envelope) and as a consequence we obtain efficient approximation algorithm 
for relative-error all-pairs-Li-distances. Finally, in Section [5l we show that for piecewise 
polynomial functions one can approximately sample from the integrals, leading to slightly 
less efficient approximation algorithms. 

2 Cauchy random projections 

Dimension reduction (the most well-known example is the Johnson-Lindenstrauss lemma 
for L2-spaces |JL84| ) is a natural technique to use here. We are interested in Li-spaces for 
which the analogue of the Johnson-Lindenstrauss lemma is not possible [BC05 , NL04] (that 
is, one cannot project points into a low dimensional Li-space and preserve distances with 
a small relative error). However one can still project points to short vectors from which 
Li-distances between the original points can be approximately recovered using non-linear 
estimators [LHC071 ITndnH] . 

A particularly fruitful view of the dimensionality "reduction" (with non-linear estima- 
tors) is through stable distributions ([JS82, Ind06j): given vectors i>i,... ,v m one defines 
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(dependent) random variables X±, . . . ,X m such that the distance of Vj and Vk can be re- 
covered from Xj — Xk (for all j, k G [m]). For example, in the case of Li-distances Xj —X^ 
will be from Cauchy distribution C(0, \\vj — Vk\\i), and hence the recovery problem is to 
estimate the scale parameter R of Cauchy distribution C(0, R). This is a well-studied 
problem (see, e. g., [HBA70 ). We can, for example, use the following nonlinear estimator 
(other estimators, e.g., the median are also possible [Ind06| ) : 

Lemma 2.1 (Lemma 7 of [LHC07J). Let Xi, X2, ■ ■ ■ , Xt be independent samples from the 
Cauchy distribution C(0, D). Define the geometric mean estimator without bias- correction 

t 

D gm = l[\X j \ 1 / t . 

i=i 

Then for each e G [0, 1/2], we have 

P(p 9 m G [(1 - s)D, (1 + e)Z>]) > l-2exp(-te 2 /8). 

We first illustrate how Cauchy random projections immediately give an efficient relative- 
error approximation scheme for piecewise uniform distributions. 

Let T consist of m piecewise uniform densities, that is, each member of T is a mixture 
of n distributions each uniform on an interval. Let a\,...,a s be the endpoints of all the 
intervals that occur in T sorted in the increasing order (note that s < 2mn). Without 
loss of generality, we can assume that each distribution fj £ J 7 is specified by n pairs 
(bji, Cji), . . . , (bj n , Cj n ) where 1 < bj\ < Cj± < ■ ■ ■ < bj n < Cj n < s, and for each pair 
(bj£, Cji) we are also given a number a.j£ which is the value of fj on the interval [a,b- e ,a c . e ). 

Now we will use Cauchy random projections to compute the pairwise Li-distances 
between the fj efficiently. For t G {1, . . . , s — 1} let Zi be independent from the Cauchy 
distribution C(0, a^+i — a^). Let Yf = Z% + ■ ■ ■ + for t = 1, . . . , s. Finally, let 

n n 

Xj := " Y hjt ) = £ a jt (Z b . t + ■■■ + Z c .^). (1) 

e=i e=i 

Note that Xj is a sum of Cauchy random variables and hence has Cauchy distribution 
(in fact it is from C(0, 1)). Thus Xj — X^ will be from Cauchy distribution as well. The 
coefficient of in Xj — X^ is the difference of fj and on interval [ai, ag+i)- Hence the 
contribution of Zi to Xj — X^ is from Cauchy distribution C(0, f^ +1 \fj(x) — fk( x ) \ dx), 
and thus Xj — X^ is from Cauchy distribution C(0, \\fj — /fc||i). 

Remark 2.2. In the next section we will generalize the above approach to piecewise 
degree-<i-porynomial densities. In this case for each (bjg, Cji) we are given a vector ctj£ G 
M. d+1 such that the value of fj on interval [ab je ,a c . e ) is given by the following polynomial 
(written as an inner product): 

fj(x) = (1,X, ...,X d )- OLjl. 
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3 Cauchy motion 



A natural way of generalizing the algorithm from the previous section to arbitrary den- 
sity functions is to take infinitesimal intervals. This leads one to the well-studied area 
of stochastic integrals w.r.t. symmetric 1-stable Levy motion (also called Cauchy mo- 
tion). Cauchy motion is a stochastic process {X(t),t £ R} such that -X"(0) = 0, X 
has independent increments (i.e., for any t\ < t% < ••• < t& the random variables 
X(t-2) — X(ti), . . . ,X(tk) — X(tf;-i) are independent), and X(t) — X(s) is from Cauchy 
distribution C(0, \t — s\). Intuitively, stochastic integral of a deterministic function w.r.t. 
Cauchy motion is like a regular integral, except one uses X{t) — X(s) instead of t — s for 
the length of an interval (see section 3.4 of [ST94J for a readable formal treatment). 

We will only need the following basic facts about stochastic integrals of deterministic 
functions w.r.t. Cauchy motion (which we will denote dC(x)), see IS I 9 1 . Chapter 3. 

Fact 3.1. Let f : R — > R be a (Riemann) integrable function. Let X = J f(x) dC(x). 
Then X is a random variable from Cauchy distribution C(0,R) where 

R= [ b \f(x)\dx. (2) 



Fact 3.2. Let fi, . . . , fj : R — ► R be (Riemann) integrable functions. Let <fi = (/i, . . . , f^) ■ 
R — ► R d . Let (Xi, . . . , Xd) = J b 4>(x) dC{x). Then (Xi, . . . , Xj) is a random variable with 
characteristic function 

f(ci, ...,c d ) = exp (- J \c\f\{x) H Vc d f d (x)\ dx^j . 

Fact 3.3. Let f, g : R — > R be (Riemann) integrable functions. Let a < b, a, (3 E R. Then 

[\af + f3g)d£(x)=a f f 6C{x) + p f gdC{x). 

J a J a J a 

Let h(x) = f(a + (b — a)x). Then 

b rl 

f{x)dC{x) = (b- a) / h{x)dC{x). 
Jo 

From facts 13.11 and 13.31 it follows that the problem of approximating the Li-distances 
between densities can be solved if we can evaluate stochastic integrals w.r.t. Cauchy 
motion; we formalize this in the following observation. 

Observation 3.4. Let /i, . . . , f m : R — > R be probability densities. Let (p ■ R — ► R m be 

defined by 4>(x) = (fi(x), . . . , f m (x)). Consider 



(X 1 ,...,X m )= / <j>(x)d£(x). (3) 

J — oo 

For all j, k € [m] we have that Xj — X^ is from Cauchy distribution C(0, \\fj — /fe||i). 
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Note that the Xj defined by (pQ) are in fact computing the integral in ([3]) . For piecewise 
uniform densities it was enough to sample from the Cauchy distribution to compute the 
integral. For piecewise degree-d-polynomial densities it will be enough to sample from the 
following distribution. 

Definition 3.5. Let <f> : R -» R d+1 be defined by <j)(x) = (l,x,x 2 , . . .,x d ). Let CI d (a,6) 
be the distribution of Z, where 

Z:= (Z ,...,Z d ) := f 4>{x)dC{x). 

J a 

Note that given a sample from CLj(0, 1), using 0(d 2 ) arithmetic operations we can 
obtain a sample from CLj(a, b), using Fact 13.31 

Lemma 3.6. Let J 7 consist of m piecewise degree-d-polynomial densities, each consisting 
of n pieces (given as in Remark \2.2\) . Let t > (8/e) 2 ln(m 2 /5) be an integer. Assume 
that we can sample from CLj(0, 1) using T d operations. We can obtain (5, e) -relative- 
error approximation of L\- distances between all pairs in T , using 0{(d 2 + Tyrant + m 2 t) 
arithmetic operations. 

Proof : 

For £ £ {1, ...,s — 1} let Zi be independent from CLj(e^, o^+i) distribution. Let Yt = 
Z\ ^ + Zt-\, for £ = 1, . . . , s. Finally, for each j £ [m], let 

n n 

Xj := <*,l ■ (Xc je ~ Y bjl ) = CL jt ■ {Z bjt + ■■■ + Z C]l . x ). (4) 
i=\ i=\ 

Note that Y c . t — Y b]l is from C {a bjl , a Cje ) and hence 

otji ■ (Y Cjl - Y bjl ) = I f,j{x) dC(x). 

Thus (Xi, . . . , X m ) defined by (jH) compute <^j. 

For every j, k G [m] we have that Xj — X^ is from Cauchy distribution 

C(0,\\fj-f k \\i). 

If we have t samples from each X\, . . . , X m then using Lemma 12. II and union bound with 
probability > 1 — 5 we recover all \\fj — fh\\\ with relative error e. 

Note that s < 2mn and hence for the Zi we used < 2mnt samples from CI(0, 1) 
distribution, costing us 0((d 2 + T^mnt) arithmetic operation. Computing the Yi takes 
0(mnt) operations. Computing the Xj takes 0(mnt) operations. The final estimation of 
the distances takes 0(m 2 t) operations. ■ 

4 Piecewise linear functions 

The density function of Cli(0, 1) can be computed explicitly, using the inverse Fourier 
transform; the proof is deferred to the appendix. The expression for the density allows us 
to construct efficient sampling algorithm, which in turn yields an efficient approximation 
algorithm for all-pairs-Li-distances for piecewise linear densities. We obtain the following 
result. 



5 



Theorem 4.1. Let T consist of m piecewise linear densities, each consisting of n pieces 
(given as in Remark \2.£\) . We can obtain (5, e) -relative- error approximation of L\- distances 
between all pairs in J 7 , using 0(m(m + n)e~ 2 \n(m/6)) arithmetic operations. 

Now we state the density of Cli(0, 1). In the following denotes the real part of a 
complex number x. 



Theorem 4.2. Let 



M 2 be the function 4>(x) = (l,ac). Let 

Z = (X U X 2 ) = f 1 4>(z)d£(z). 
Jo 



For x\ 7^ 2x2 the density function of Z is given by 
f(xi,x 2 ) 

where 



2 / atan(iQ/(xi -2x 2 )) \ 

1 + 6x 2 + x\ - 16xix 2 + l§x 2 2 it 2 V /> : ' 



Q 3 / 2 



Q = 1 — 2ix\ + x 1 + 4ix 2 . 
For x\ = 2x2 the density is given by 



f(xi,x 2 ) 



4/tt 2 



+ 



1 



(1+x 2 ) 2 vr(l + x 2 ) 3 /2- 



(6) 
(7) 




Figure 1: The density plot of (Xi,X 2 ) = (1, z) d£(z). The contours are at levels 



-15 o-14 



Next we show how to efficiently sample from the Cli(0, 1) distribution by rejection 
sampling using the bivariate student distribution as the envelope. 
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Let S be a positive-definite 2x2 matrix. The bivariate student distribution with 1 
degree of freedom is given by the following formula (see, e.g., [ESQO], p. 50) 

It is well-known how to sample X from this distribution: let X = S 1 / 2 Y /VW \ where 
Y, W are independent with Y ~ A^(0, J) (the two dimensional gaussian) and W ~ X 2 (l) 
(chi-squared distribution with 1 degree of freedom). 

We are going to use the bivariate student distribution with the following density 

g^) = -(l + x l + (2x 2 -x 1 ) 2 y 3/2 . (8) 

We show that the density function of the Cli(0, 1) distribution is bounded by a constant 
multiple of (J8j) (the proof is deferred to the appendix). 

Lemma 4.3. Let /(x) be given by ([5]) and ([7|). Let g(x) be given by ([8]). For every x £ R 2 
we have 

/(x)<-- 5 (x), 

where C = 25. 

As an immediate corollary of Lemma 14.31 we obtain an efficient sampling algorithm for 
Cli(0, 1) distribution, using rejection sampling (see, e.g., [ES00J). 

Corollary 4.4. There is a sampler from Cli(0, 1) which uses a constant number of samples 
from from -/V(0, 1) and x 2 (l) expectation). 

Proof of Theorem I4.lt 

The theorem follows from Corollary 14.41 and Lemma 13.61 ■ 

Remark 4.5. Lemma 14.31 is true with C = 7r2 3 / 2 (we skip the technical proof). The 
constant 7r2 3 / 2 is tight (see equation (flOj) with a — > and T — » 1). 



5 Piecewise polynomial functions 

Some kernels used in machine learning (e.g., the Epanechnikov kernel, sec [DLOlJ, p. 85) 
are piecewise polynomial. Thus it is of interest to extend the result from the previous 
section to higher-degree polynomials. 

For d > 1 we do not know how to sample from distribution CI^(0, 1) exactly. However 
we can still approximately sample from this distribution, as follows. Let r be an integer. 
Let Zi, . . . , Z r be independent from Cauchy distribution C(0, 1/r). Consider the following 
distribution, which we call r-approximation of CLj(0, 1): 

r 

(X , ...,X d ) = Y,Zy (1, UM, (j/r) 2 , (j/r) d ). (9) 

3=1 

Now we show that if r is large enough then the distribution given by Q can be used 
instead of distribution CLj(0, 1) for our purpose. As a consequence we will obtain the 
following. 
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Theorem 5.1. Let T consist ofm piecewise degree-d-polynomial densities, each consisting 
of n pieces (given as in Remark \2.Sfy . We can obtain (5, e) -relative- error approximation of 
Li-distances between all pairs in J 7 , using 0{m(m + n)d 3 £~ 3 ln(m/6)) arithmetic opera- 
tions. 

Remark 5.2. Note that for d = 1 Theorem 15.11 gives worse (in e) running time that 
Theorem 14.11 This slowdown is caused by the additional integration used to simulate 
CI d (0,l). 

The proof of Theorem 15.11 will be based on the following result which shows that Q 
is in some sense close to CLj(0, 1). 

Lemma 5.3. Let p = a$ + a\x + • • • + a,jX d be a polynomial of degree d. Let (Xq, . . . , X^) 
be sampled from the distribution given by with r > cd 2 /e (where c is an absolute 
constant). Let W = oqXq + • • • + a^X^. Then W is from the Cauchy distribution C(0, R), 
where 



(10) 



(1-e) f \p(x)\dx< R< (1 + e) / \p(x)\dx. 
Jo Jo 

We defer the proof of Lemma 15.31 to the end of this section. Note that having (jlOh 
instead of (|2|) (which sampling from CLj(0, 1) would yield) will introduce small relative 
error to the approximation of the Li-distances. 

Proof of Theorem I5.lt 

The proof is analogous to the proof of Lemma 13.61 Let r > cd 2 je. For £ £ {1, . . . , s — 1} 
let be independent from r-approximation of CI^(a£, a^+i) distribution. Let Yi = Z\ + 
• • • + for I = 1, . . . , s. Finally, for each j G [m], let 

n n 
X 3 := Yl ' ( Yc ^ - Yb J^ = a H • ( Zb Jl + ■■■ + Z c 3l -l)- 

By Lemma 15.31 for every j, k S [m] we have that Xj — X), is from Cauchy distribution 
C(0,R) where (1 - e)^ - / fe ||i < R < (1 + e)^ - f k \\ x . 

If we have t > (8/e) 2 ln(m 2 /S) samples from each X\, . . . , X m then using Lemma 12. II 
and union bound with probability > 1 — 5 we recover all \\fj — fk\\\ with relative error 
« 2e. 

Note that s < 2mn and hence for the Zg we used < 2mnt samples from r-approximation 
of CI(0, 1) distribution, costing us 0((d 3 /e)mnt) arithmetic operation. Computing the 
Y( takes 0(mnt) operations. Computing the Xj takes 0(mnt) operations. The final 
estimation of the distances takes 0(m 2 t) operations. ■ 

To prove Lemma 15.31 we will use the following Bernstein- type inequality from [[ErdOOj. 

Theorem 5.4. (Theorem 3.1 of JErdOO^ ) There exists a constant c > such that for any 
degree d polynomial p, 

\p'(x)\dx < cd 2 / \p(x)\dx. 



J 

Jo 
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We have the following corollary of Theorem [5 
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Lemma 5.5. There exists a constant c such that for any polynomial p of degree d, any 
r > cd 2 , any = xq < x\ < X2, ■ ■ ■ < xt = 1 with maxj \xj — Xj~\\ < 1/r, and any 
6i G [x o ,x 1 ],0 2 G [xi,x 2 ], ■ ■ ■ ,0t G [xi_i,x t ], we have 

r-l t [-1 

(l-cd 2 /r) \p(x)\dx < J2(xj - Xj-x)\p{9j)\ < (1 + cd 2 /r) / \p(x)\dx. (11) 
Proof : 

We will use induction on the degree d of the polynomial. For d = the sum and the 
integrals in (jlip are equal. 

Now assume d > 1. For each j G [i], we use the Taylor expansion of p{x) about 6j for 
x G (xj-i,Xj]. This yields for each x G (xj-i, Xy], p[x) = + (x — Gj)p'(0'j x ), where 
9j x G (xj_i, Xj]. Let f3j be the point y G (xj-i, Xj] that maximizes p'(y). We have 



^(x.-x^ik^)!- f\p{x)\dx <J2 P \p(x)-p(e,)\dx 

i—1 i-1 ^^i-l 

3-1 3-1 (12) 

<^/ ' |(x-^y(^J|dx<-^(x J -x J _ 1 )|p / (/3,)|. 

1=1 ^-l zr 1=1 



Since p' is of degree d — 1, by induction hypothesis the right-hand side of (|12p is bounded 
as follows 

Y r 5>J ~ *i-i)WiPi)\ <Yr {l + 0{d ~ l)2£) Si |p ' (x)|dX 

< (1/r) / |p'(x)|dx < (cd 2 /r) / |p(x)|dx. 
Jo Jo 

where in the last inequality we used Theorem 15.41 Hence the lemma follows. ■ 

Proof of Lemma 15. 3t 

We have 

r r 
W = (ao, • • • ,a d ) ■ U/r), (j/r) 2 ,. . . , (j/r) d ) = £ Z jP (j/r), 

3=1 3=1 

where Zj are from Cauchy distribution C(0, 1/r). Thus W is from Cauchy distribution 
C(0,R), where 

R=\jZ\p{3/r)\. 

3=1 

Using Lemma 15.51 we obtain (llOp . ■ 

Remark 5.6. An alternate view of Lemma 15.51 is that a piecewise degree-cf-polynomial 
density with n pieces can be approximated by a piecewise uniform density with 0{nd 2 je) 
pieces. The approximation distorts Li-distances between any pair of such densities by a 
factor at most lie. To obtain a relative-approximation of the Li-distances in a family T 
one can now directly use the algorithm from Section [2] without going through the stochastic 
integrals approach (for d = 1 the price for this method is a 1/e factor slowdown) . 
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Remark 5.7. (on /^-distances) For L2-distances the dimension reduction uses normal 
distribution instead of Cauchy distribution. For infinitesimal intervals the corresponding 
process is Brownian motion, which is much better understood than Cauchy motion. Eval- 
uation of a stochastic integral of a deterministic function R — > M. d w.r.t. Brownian motion 
is a d-dimensional gaussian (whose covariance matrix is easy to obtain), for example 

(1,2,... ,X d )dC B rown(x) 

is from N(0, S) where S is the (d + 1) x (d + 1) Hilbert matrix (that is, the ij-th entry of 
Sis + 

Question 5.8. How efficiently can one sample from CLj(0, 1) distribution? A reasonable 
guess seems to be that one can sample from a distribution within L\-distance 5 from 
CLj(0, 1) using d 2 ln(l/<5) samples. 
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6 Appendix 

6.1 Stochastic integral of (constant, linear) function 

In this section we give an explicit formula for the density function of the random variable 

(X,Y)= C cp(z)dC{z), 
Jo 

where (f>(z) = (1, z), and dC(z) is the Cauchy motion. 

We will obtain the density function from the characteristic function. The following 
result will be used in the inverse Fourier transform. (We use 9ft to denote the real part of 
a complex number.) 

Lemma 6.1. Let (f) = (</>i, . . . , cj) n ) : R -> W a . Let 



Z — (Xi, . . . , X n ) — I 
Jo 



4>(x) dC(x), 



where C is the Cauchy motion. The density function f of Z is given by 

( (n-1)! r r 2 db . . . db \ n 3 ) 

v (2tt)» y_oo y_oo (a + iBY T y } 

A = A(b 1 ,...,b n - 1 ):= [ + - + (14) 

Jo 

and 

B = B(bi, &„_i, xi, . . . , x n ) := Mi H h 6„_ix n _i + x n . (15) 



where 
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Proof : 

The characteristic function of Z is (see, e.g., proposition 3.2.2 of [ST94]): 

/(»!, ...,a n ) = £'[exp(f(aiXi H h a n X n ))] = exp ^- ^ K^x) H h a n n (x)M 

We will use the following integral, valid for any A > (see, e. g., [G R07] ): 



CO 







t«- eM-M) cos(Bt) dt = ^ (^-^ + ) . (16) 



We would like to compute the inverse Fourier transform of /, which, since / is symmetric 
about the origin, is given by 

2 roc roo roo 

f(xi, . . . ,x n ) = / / ... / /(oi,... j On) cos(oi»H \-a n x n )dai . . . da n -ida n 

(27r) n Jo y_ 00 y_ oc 

(17) 

Substitution a n = t, a n _i = 6 n _it, . . . , ai = &it into (fTT)) yields 

f(x%, . . . , x n ) = 
2 



2^ 



oo foo I foo / rl 

t n ~ l exp I -i / |6i0i(x) H h b n -i<t> n -i(x) + n (x) 

— oo J -co \ JO \ JO 



cos (t(6iXi + • • • + 6 n _!X n _i + x n )) dt dbi . . . db. 



'n-1- 



Note that the inner integral has the same form as (|16p and hence we have 
( n _ i)l r°° roc i i 

- • • ■ *») - y_ • • • jx^bt + UTW dbl ■ ■ ■ d6 - 1 

where ^4 and I? are given by (|14p and (|15p . The last equality in (|18p follows from the fact 
that the two summands in the integral are conjugate complex numbers. ■ 

Now we apply Lemma 16. II for the case of two functions, one constant and one linear. 

Proof of Theorem 14. 2t 

Plugging n = 2, (f>i(x) = 1, and 02 0*0 = x into ([Till and (fl~5j) we obtain 

-B(6i,xi,x 2 ) = feixi + x 2 (19) 

and 

r 6i + 1/2 if 6 X > 0, 
A(bi) = I -h- 1/2 if 6i < -1, (20) 
I b\ + bi + 1/2 otherwise. 

Our goal now is to evaluate the integral (fl~3|) . We split the integral into 3 parts according 
to the behavior of A(b\). 
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We will use the following integral 

1 i 1 

(Sz + T) 2 dz = ~S{T + Sx)' (21) 

For B = b\X\ + x 2 and A = bi + 1/2 we have A + iB = b\(l + ix\) + (1/2 + 1x2). Using 
([H]) for A and 5 given by (US]) and {20])) we obtain 

1 2 

r db i = T r-TT, ( 22 ) 



o (,4 + iB) 2 (ixi + l)(2ix 2 + l)' 
and 

- 1 1 _ 2 

(A-iB) 2 ^ 1 " (ixi-l)(2i(xi-x 2 )-l)' (23) 
We have (see, e.g., [GRQ7])) 



/ (z 2 + 5; 



q , o 7 4atan ( (5 + 2z)/V4T - S 2 

Sz + T) 2 (4T-S 2 )(T + Sz + z 2 )^ (AT - S 2 fl 2 ' 



For A = b\ + 61 + 1/2 and 5 = b x x\ + x 2 we have A + iB = b\ + b 1 (l+ixi) + (1/2 + x 2 ). 
Using (j24"|) we obtain 

2(ix! + 1) ^ 2(zx! - 1) ^ . atan (w) - atan ( W 



(A + iB) 2 (2ix 2 + l)Q (2i(xi - x 2 ) - 1)Q Q 3 / 2 

where Q is given by ©. 

Summing ([22j), ([H, and ([25]) we obtain 



(25) 



1 a ai>aji — 7=- — awn — 7=- 
1 d6 8 +4 ^ ^ L— \^L. (26) 



atan[%±i] - atan ' si^l 



(,4 + iB) 2 1 Q(l + xf) Q3/2 

(l + xf 



We have 

iXi±l 4 M_l_„^2 



(l + x 2 ) 2 + (2xi -4x 2 ) 2 



< 1. 



with equality only if x\ = 2x 2 . Hence if X\ 7^ 2x 2 then using ()42p we have 

/ ix\ + 1\ / ixi — 1\ „ 

atan I — -=— I — atan I — -=— I = atan(iQ/(xi — 2x 2 )J, 

V VQ J \ VQ J 

and by applying 
in (23) we obtain ©. 



Q(l + x 2 ) / 1 + 6x 2 + x| — 16xix 2 + I6X2 



If xi = 2x 2 then Q = 1 + x\ and using 



atan ( Xl J_ J — atan [ 1 ) = ir/2 
VQ J V vv 



in d5BJ we obtain ©. 
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6.2 Bounding the Cli(0, l)-distribution 

Now we prove that the multivariate student distribution gives an efficient envelope for the 
Cli(0, ^-distribution. 

Proof of Lemma 14. 3t 

To simplify the formulas we use the following substitutions: x\ = u and X2 = w + u/2. 
The density g becomes 

g'(u,v) :=-(l + u 2 + Aw 2 y 3/2 ■ 
For w = (which corresponds to x\ = Ix-i) the density / becomes 

4/tt 2 1 

(l +n 2)2 + ^(1+^)3/2' V 7 > 

and hence Lemma 14.31 is true, as 

m < (4/tt + 1) 0(i+«t 3/2 

For density © becomes 

, 1 / 4 atan(iM/(2to)) atan(iM'/(2io)) \ 

/ («> U) :_ ^2 \ v (l + n 2 ) 2 + (4w;) 2 + W W> J ' 

where M = (1 + u 2 — Aiw) 1 / 2 and M' = (1 + u 2 + Aiw) 1 ^ 2 . We are going to show 

7T 2 f(u,v)<C7Tg'( U ,v). (28) 



Note that both sides of (|28p are unchanged when we flip the sign of u or the sign of 
w. Hence we can, without loss of generality, assume u > and w > 0. 

There are unique a > and b > such that io = a&/2 and it = yj a 2 — b 2 — 1 (to see this 
notice that substituting b = 2w/a into the second equation yields u 2 + 1 = a 2 — Aw 2 /a 2 , 
where the right-hand side is a strictly increasing function going from — oo to oo). Note 
that M = a — ib and M' = a + ib. Also note that 

a 2 > b 2 + 1. (29) 

After the substitution equation (|28p simplifies as follows 



4 1 // -,N3 f 1 i 

+ — -= r^-o (a + so) atan — h 



{a 2 + b 2 ) 2 {a 2 + b 2 fy ' \a b 



+ (a - z6) d atan < 



(30) 



a bj J ~ (a 2 -b 2 + a 2 b 2 fl 2 ' 
Now we expand (a + ib) s and (a — ib) 3 and simplify (|30p into 

+ —2 r^-? ( a ~~ 3ao ) I atan I — \- - ] + atan I - 



(a 2 + b 2 ) 2 {a 2 + b 2 ) 3 y '\ \a b 

-i(b 6 - 3a z b) 



atan — h - — atan < 



V \a bj \a bj J J ~ (a 2 -b 2 + a 2 b 2 ) 3 / 2 ' 

(31) 
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Now we substitute a = 1/A and b = 1/B into (|3ip and obtain 

4A 4 B A A 3 B 3 ( 

+ —2 ^3 (- b3 ~ S^ 2 - 8 ) ( atan ( A + iB ) + atan ( A ~ iB ) ) 



(A 2 + B 2 ) 2 [A 2 + B 2 f 

\ C ■ A 3 B 3 

-i(A 3 - 3ylB 2 )(atan (A + iB) - atan {A-iB))\< — _ - - /2 . 

(32) 

Note that yl > and B > and the constraint (|29|) becomes 

£? 2 > A 2 (l +B 2 ). (33) 
Multiplying both sides of ([32]) by (yl 2 + B 2 ) 3 /(AB) 3 we obtain 
AAB(A 2 + B 2 ) + (5 3 - 3A 2 B) (atan (yl + »J3) + atan (A - iB) ) 

r ■ ( A 2 + R 2 "i 6 f34) 
-i(yl 3 - 3yl£ 2 ) (atan (A + iB) - atan (A - iB) ) < ^ i ^ + '> . 

Finally, we substitute A = Tsina and B = Tcosa with T > 0. Note that the constraint 
([33]) becomes 

• -,2 ^ cos(2a) 

rsma < t ttt, (35) 

(cosa)^ 

and hence a is restricted to [0, 7r/4). 
Equation (|34|) then becomes 



2T 4 sin(2o) + T 3 cos(3q) (atan (yl + iB) + atan (yl - iB)) 
+iT 3 sin(3a) (atan (yl + iB) - atan (yl - iB)) < 



, _ , C-T 6 (36) 



(T 2 cos(2a) + l) 3 / 2 ' 

We prove ([36]) by considering three cases. 

CASE: T < 1 . We can use (pj to simplify ([36]) as follows 

m , , , . /2Tsin(a)\ . . , /2Tcos(a)\ C-T 3 

2Tsm(2a) + cos(3a) atan ^ - _ — j - sm(3a) atanh ^ - - y2 j < — — ^ - i)3/2 . 

(37) 

For 2>0we have atanh(z) > z > atan(z) and hence to prove (|37p it is enough to show 
2Tsin(2a)(l — T 4 ) + (1 + T 2 ) cos (3a) (2Tsin(a)) 

C-T 3 (l-T 4 ) (38) 

which is implied by the following inequality which holds for T < 8/9: 

. . ,„ , . . C- 2465/6561 C-(l-T 4 ) 

- 2T* sm( 2a ) + 2 S m(4 a ) < 2 < J < ^_L_^ . (39) 
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For 1 > T > 8/9 we directly prove ([38 

{2Tsm(a)\ ( 2T cosia 

2Tsm(2a) + cos(3a) atari I — ^— I — sm(3a) atanh 



1-T 2 J y ' \ 1 + T 2 

< 2 + , /2 <^L^/™< o.i* 



2 3/2 - (T 2 cos(2a) + 1) 3 / 2 ' 

CASE: T > 1. We can use (03]) and (02) to simplify fl36|) as follows 

/ / 2Tsin(a) 

2Tsin(2a) + cos(3a) I 7r + atan 



1-T 2 

, . , /2Tcos(a)\ C-T 3 
sm(3a) atann I — — — < 



(40) 



1 + T 2 J ~ (T 2 cos(2a) + l) 3 / 2 ' 



From (|35h we have Tsin(a) < 1 and hence 2Tsin(2a) < 4. Therefore (I40p can be proved 
as follows. 

m . ,„ , . . / /2Tsin(a)\\ , . , /2Tcos(q) 

2Tsm(2a) + cos(3a) I tt + atan ( — ^— I I — sm(3a) atanh 1 



1-T 2 J J y ' \ 1 + T 2 

C C ■ T 3 

< 4 + 3^/2 < — < 



2 3/2 - ( T 2 cos (2a) + l) 3 / 2 ' 
CASE: T = 1. Equation (J3B} simplifies as follows 

C 

2sin(2a) + (vr/2) cos(3a) — sin(3a) atanh (cos(a)) < — tta7- (41) 

(cos(2q) + l) 3 / 2 

The left-hand side is bounded from above by 2 + ir/2 which is less than C/2 3 / 2 which 
lower-bounds the right-hand side of (|4ip . ■ 



6.3 Basic properties of trigonometric functions 

In this section we list the basic properties of trigonometric functions that we used. For 
complex parameters these are multi-valued functions for which we choose the branch in 
the standard way. The logarithm of a complex number z = (cos a + isina)e*, where 
a £ (— 7r, 7r], and t E K is ia + t. The inverse tangent of a complex number z£C \ 
is the solution of tan(x) = z with 3f?(x) G (— n/2, 7r/2). In terms of the logarithm we have 

atan(z) := — i (ln(l — iz) — ln(l + iz)) . 
The inverse hyperbolic tangent function is defined analogously, for z 6 C \ {±1} we have 

atanh(z) := - (ln(l + z) — ln(l — z)) = — i atan(iz). 
For non-negative real numbers z we have the following inequality 

atanh(z) > z > atan(z). 
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The atan function (even as a multi- valued function) satisfies 

x + y 



tan (atan (x) + atan(y)) 



xy 



for any values of x, y £ C \ {±i}, with xy / 1. 

For a 2 + b 2 < 1 the real part of atan(a + bi) is from (— vr/4, vr/4). Hence 

\x\ < 1 A 1 2/ 1 < 1 =>■ atan(x) + atan(y) = atan I . (42) 



1- xy 

For a > and a 2 + b 2 > 1 the real part of atan(a + bi) is from [vr/4, vr/2). 

a>0Aa 2 + fe 2 >l atan(a + bi) + atan(a - bi) = ir + atan(2a/(l - a 2 - b 2 )). (43) 

For a > the real part of atan(a + bi) is from [0, tt/2). Hence for any a, b with a + ib / ±i 
we have 

atan(a + bi) — atan(a — bi) = atan ( ^ — To ) • (44) 
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